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Abstract. The Pulsar Virtual Observatory will provide a 
means for scientists in all fields to access and analyse the 
large data sets stored in pulsar surveys without specific 
knowledge about the data or the processing mechanisms. 
This is achieved by moving the data and processing tools 
to a grid resource where the details of the processing are 
seen by the users as abstract tasks. By developing intelli- 
gent scheduling middle-ware the issues of interconnecting 
tasks and allocating resources are removed from the user 
domain. This opens up large sets of radio time-series data 
to a wider audience, enabling greater cross field astron- 
omy, in line with the virtual observatory concept. Imple- 
mentation of the Pulsar Virtual Observatory is underway, 
utilising the UK National Grid Service as the principal 
grid resource. 



1. Background 

The concept of a Virtual Observatory (VO) is to enable 
access to astronomical data archives via a generic inter- 

This interface can be some 



face (Quin fc Gorski 2004) 



well specified machine interface, e.g. an XML schema, or 
a direct to user interface, e.g. a HTML interface. The use 
of a generic interface is to reduce the level of domain spe- 
cific knowledge about the data, enabling users from a wide 
range of backgrounds to access the data. This reduces 
the issues involved in extracting and processing data from 
many resources, allowing scientists to search for answers 
across a wide range of data archives. Standardisation of 
communication protocols allows for intercommunication 
between many VO projects, further widening the scope of 
research. 

The Pulsar Virtual Observatory aims to apply the VO 
concept to data from archived pulsar surveys. This will en- 
able new pulsar science as astronomers and theoreticians 
can access data that would otherwise be hidden. Access 
will be made as simple as possible by providing a web 
interface that can be used by a standard web browser ap- 
plication. There will also be a range of options to allow 
users with different levels of experience to benefit fully 
from the Pulsar Virtual Observatory. 



2. Data archives 

The Pulsar Virtual Observatory has not been designed 
with any constraints over the data format that is used, 
and it is intended that processing can cover multiple data 
formats seamlessly. Data will be indexed by sky position 
as it is expected that users will search data at specific loca- 
tions of interest, and so access to multiple data catalogues 
that cover the same sky area is advantageous. 

Initially the data set that will be made available 
via the Pulsar Virtual Observatory will be the Parkes 
Multi-beam Pulsar Survey (PMPS) raw data archive 



(Manchester et al. 2001) This is the most complete sur- 



vey of the galactic plane to date, and provides an ideal 
resource for finding new pulsars and other transient radio 
phenomenon. Data was collected using the 13 beam HI 
(21cm) receiver on the 64m Parkes telescope in Australia 
and quickly became the most successful pulsar survey to 
date. The archived data covers the galactic plane from 
260° to 50° in galactic longitude and ±5° of galactic lati- 
tude. The raw data archive is around 4.4 terabytes in size, 
consisting of approximately 35000 integrations of 35 min- 
utes with a 288 MHz band centred on 1374 MHz. The 
data archive has been extensively searched for period sig- 
nals, however more pulsars continue to be discovered with 
in depth targeted searches and by using more advanced 



search tools (Faulkner et al. 2004) 



This data is stored as raw data, as it is taken from 
the telescope, giving the maximum flexibility in analysing 
the data. Working with raw data means that the analysis 
can be performed with the latest algorithms, which are 
constantly being improved. This allows for expansion of 
the functionality of the Pulsar Virtual Observatory that 
would not be possible with archives of search results, or 
partially processed files. The Pulsar Virtual Observatory 
can also be expanded by adding new data archives to the 
system. This will enable new types of search, e.g. searching 
in different frequency bands, as well as expanding the sky 
coverage available. It is expected that more Parkes surveys 
can be added to the system when the data is made publicly 
available. 
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There are some issues regarding data storage that need 
to be addressed when developing this system. Currently 
the raw data archives are stored on RAID disks at Jodrcll 
Bank Observatory. It is intended that these archives be 
made available to the Pulsar Virtual Observatory via a 
simple file transfer interface, however most frequently ac- 
cessed data can be moved close to the processing nodes, to 
reduce data transfer times. The design of the Pulsar Vir- 
tual Observatory makes it possible to store data in multi- 
ple locations, allowing for replication of data for backup or 
speed purposes. It is intended that the system also imple- 
ment a pre-fetching system, where the data is transfered 
to a location near to the processing node prior to the pro- 
cessing starting. This is can be achieved by transferring 
the appropriate data file when the task is entered into a 
processing queue. This reduces the processor time wasted 
while waiting for the file to transfer to the node that is 
performing the computation. 

3. Computation resources 

The Pulsar Virtual Observatory will provide access to 
compute resources as well as the large data archives. Pro- 
cessing will be made available via a simple web interface, 
directly linked to the searchable data archive. This design 
allows users to analyse data without the need to locate 
and install software to their local systems. This enables 
greater access to the data as many users do not have the 
knowledge, skills, resources or time required to install and 
run the appropriate software. Advanced users can still be 
given the opportunity to customise the software that is 
being used, or download the data to process by their own 
means. 

The current target system is the UK National Grid 
Servic<fl (NGS) a grid computing resource for UK science 
projects. There is however nothing to prevent the Pulsar 
Virtual Observatory using other resources in combination 
with or instead of the NGS. 

The NGS is a dedicated high performance grid resource 
for the UK eScience programme. It comprises four core 
nodes at Manchester, Oxford and Leeds universities and at 
the Rutherford Appleton Laboratories. The NGS is suit- 
able to be used as our initial target system as it provides 
high performance computing and data storage facilities in 
a grid environment. The use of grid facilities makes inte- 
gration with the Pulsar Virtual Observatory easy, as each 
site can be accessed via a uniform interface. 

Because the Pulsar Virtual Observatory can use multi- 
ple resources for analysing data, the system is designed in 
such a way that the user does not need to know about the 
system their jobs are running on. This means that users 
of the Pulsar Virtual Observatory will not require sepa- 
rate authentication for each system that processing tasks 
are to be run on. By providing a simple user interface, it 



is possible for users from outside the immediate scientific 
field to analyse data, leading to more cross-field science, in 
line with the VO principles. It is intended that there will 
be direct access to the raw data if required, allowing ad- 
vanced users to develop their own processing algorithms. 

Telescope data can be searched with user specified pa- 
rameters, using a supplied set of search tools. For exam- 
ple, a user could select a number of telescope pointings 
and perform a Fourier analysis on a time-series generated 
by the dedispersion of these pointings. Users will be able 
to perform other types of searches, and new algorithms 
will be added when they are made available. 
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Fig. 1. A simple overview of the search process on the 
Pulsar Virtual Observatory. 

The search parameters are user customisable, however 
pre-defined default values will also be available for many 
options. This means that expert users will be able to cus- 
tomise the search routines for their specific task, but still 
giving wide access to users with less experience. The sys- 
tem will also provide feedback regarding the status and 
remaining runtime of tasks. This means that users will be 
able to better optimise the settings for their jobs. 

4. Implementation 

The core architecture of the Pulsar Virtual Observatory 
is based around a three layer design. 
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The top layer handles user interaction and generates 
the tasks that are required to process the user-selected 
data. It is also responsible for scheduling the tasks on the 
available compute resources. 

Compute resources are modelled by a web services 
based queueing system, developed in parallel with the 
Pulsar Virtual Observatory (Harbulot et al. 2006) The 



queueing system is a application independent, i.e. there is 
no code that is specific to the Pulsar Virtual Observatory, 
and is platform independent, i.e. there is no code specific 
to the compute resources that the tasks are being run 
on. By having tasks pulled from the queues, rather than 
pushed onto the processing resources, the queueing system 
docs not need any specialist code to interface with par- 
ticular resources. This means that all resources, whether 
single machines, clusters or grids are viewed through the 
same interface. 

The low level computation is handled by the final layer, 
which is handled by a series of simple shell scripts. This 
approach allows for deployment on a wide range of sys- 
tems as it does not rely on any specialist software being 
installed. As tasks arc pulled from the queue system there 
is no need to run the scripts in a listening mode. This 
means there is no need for inbound communication, which 
is often restricted, and the queueing server does not need 
to know the physical location of the machine that is run- 
ning the processing script. 

User interface is provided via a web interface that com- 
municates directly with the scheduler a database of the 
available data archives and current process status. This 
will provide abilities to search the data catalogues and 
submit new jobs, as well as monitor progress of running 
jobs. The web interface will also provide tools for viewing 
the results of the data analysis, although the user may 
download the results to analyse with their own software. 



Testing of the basic system is currently in progress, 
and it is expected that the system will soon be usable for 
a selection of trial problems. 

6. Conclusion 

The Pulsar Virtual Observatory will provide access to data 
and resources in order to enable greater use of archived 
pulsar survey data. This will allow users to analyse pulsar 
data in a simple and uniform manner, without detailed 
knowledge of data formats and processing software, en- 
couraging more science to be carried out. The Pulsar Vir- 
tual Observatory is designed to integrate with other VO 
projects and so enable integration of pulsar data archives 
with data sets from other astronomical fields. 
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5. Current status 

The Pulsar Virtual Observatory is not yet fully opera- 
tional, however a large fraction of the implementation is 
complete. 

There is a searchable database of the data available in 
the system, which is accessible via the web interface. Users 
can select this data and select and customise processing 
to be performed on the data. This work is then scheduled 
on the available processing nodes via a simple scheduling 
algorithm. Processing algorithms have also been success- 
fully deployed on the NGS, and test workflows have been 
completed. 

Many issues remain to be resolved however, includ- 
ing error handling for failed jobs, user notification for job 
status updates and remaining runtime estimation. There 
is still a lot of development work that is required to get 
the output from the processing software back to the user, 
and to provide satisfactory visualisation tools. Data pre- 
fetching is not yet implemented. 



